16 research outputs found

    Studying the functional conservation of cis-regulatory modules and their transcriptional output

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Cis</it>-regulatory modules (CRMs) are distinct, genomic regions surrounding the target gene that can independently activate the promoter to drive transcription. The activation of a CRM is controlled by the binding of a certain combination of transcription factors (TFs). It would be of great benefit if the transcriptional output mediated by a specific CRM could be predicted. Of equal benefit would be identifying <it>in silico </it>a specific CRM as the driver of the expression in a specific tissue or situation. We extend a recently developed biochemical modeling approach to manage both prediction tasks. Given a set of TFs, their protein concentrations, and the positions and binding strengths of each of the TFs in a putative CRM, the model predicts the transcriptional output of the gene. Our approach predicts the location of the regulating CRM by using predicted TF binding sites in regions near the gene as input to the model and searching for the region that yields a predicted transcription rate most closely matching the known rate.</p> <p>Results</p> <p>Here we show the ability of the model on the example of one of the CRMs regulating the <it>eve </it>gene, MSE2. A model trained on the MSE2 in <it>D. melanogaster </it>was applied to the surrounding sequence of the <it>eve </it>gene in seven other <it>Drosophila </it>species. The model successfully predicts the correct MSE2 location and output in six out of eight <it>Drosophila </it>species we examine.</p> <p>Conclusion</p> <p>The model is able to generalize from <it>D. melanogaster </it>to other <it>Drosophila </it>species and accurately predicts the location and transcriptional output of MSE2 in those species. However, we also show that the current model is not specific enough to function as a genome-wide CRM scanner, because it incorrectly predicts other genomic regions to be MSE2s.</p

    Hes3 is expressed in the adult pancreatic islet and regulates gene expression, cell growth, and insulin release

    Get PDF
    The transcription factor Hes3 is a component of a signaling pathway that supports the growth of neural stem cells with profound consequences in neurodegenerative disease models. Here we explored whether Hes3 also regulates pancreatic islet cells. We showed that Hes3 is expressed in human and rodent pancreatic islets. In mouse islets it co-localizes with alpha and beta cell markers. We employed the mouse insulinoma cell line MIN6 to perform in vitro characterization and functional studies in conditions known to modulate Hes3 based upon our previous work using neural stem cell cultures. In these conditions, cells showed elevated Hes3 expression and nuclear localization, grew efficiently, and showed higher evoked insulin release responses, compared with serum-containing conditions. They also exhibited higher expression of the transcription factor Pdx1 and insulin. Furthermore, they were responsive to pharmacological treatments with the GLP-1 analog Exendin-4, which increased nuclear Hes3 localization. We employed a transfection approach to address specific functions of Hes3. Hes3 RNA interference opposed cell growth and affected gene expression as revealed by DNA microarrays. Western blotting and PCR approaches specifically showed that Hes3 RNA interference opposes the expression of Pdx1 and insulin. Hes3 overexpression (using a Hes3-GFP fusion construct) confirmed a role of Hes3 in regulating Pdx1 expression. Hes3 RNA interference reduced evoked insulin release. Mice lacking Hes3 exhibited increased islet damage by streptozotocin. These data suggest roles of Hes3 in pancreatic islet function

    Ortho2ExpressMatrix—a web server that interprets cross-species gene expression data by gene family information

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The study of gene families is pivotal for the understanding of gene evolution across different organisms and such phylogenetic background is often used to infer biochemical functions of genes. Modern high-throughput experiments offer the possibility to analyze the entire transcriptome of an organism; however, it is often difficult to deduct functional information from that data.</p> <p>Results</p> <p>To improve functional interpretation of gene expression we introduce Ortho2ExpressMatrix, a novel tool that integrates complex gene family information, computed from sequence similarity, with comparative gene expression profiles of two pre-selected biological objects: gene families are displayed with two-dimensional matrices. Parameters of the tool are object type (two organisms, two individuals, two tissues, etc.), type of computational gene family inference, experimental meta-data, microarray platform, gene annotation level and genome build. Family information in Ortho2ExpressMatrix bases on computationally different protein family approaches such as EnsemblCompara, InParanoid, SYSTERS and Ensembl Family. Currently, respective all-against-all associations are available for five species: human, mouse, worm, fruit fly and yeast. Additionally, microRNA expression can be examined with respect to miRBase or TargetScan families. The visualization, which is typical for Ortho2ExpressMatrix, is performed as matrix view that displays functional traits of genes (differential expression) as well as sequence similarity of protein family members (BLAST e-values) in colour codes. Such translations are intended to facilitate the user's perception of the research object.</p> <p>Conclusions</p> <p>Ortho2ExpressMatrix integrates gene family information with genome-wide expression data in order to enhance functional interpretation of high-throughput analyses on diseases, environmental factors, or genetic modification or compound treatment experiments. The tool explores differential gene expression in the light of orthology, paralogy and structure of gene families up to the point of ambiguity analyses. Results can be used for filtering and prioritization in functional genomic, biomedical and systems biology applications. The web server is freely accessible at <url>http://bioinf-data.charite.de/o2em/cgi-bin/o2em.pl</url>.</p

    ELM: the status of the 2010 eukaryotic linear motif resource

    Get PDF
    Linear motifs are short segments of multidomain proteins that provide regulatory functions independently of protein tertiary structure. Much of intracellular signalling passes through protein modifications at linear motifs. Many thousands of linear motif instances, most notably phosphorylation sites, have now been reported. Although clearly very abundant, linear motifs are difficult to predict de novo in protein sequences due to the difficulty of obtaining robust statistical assessments. The ELM resource at http://elm.eu.org/ provides an expanding knowledge base, currently covering 146 known motifs, with annotation that includes >1300 experimentally reported instances. ELM is also an exploratory tool for suggesting new candidates of known linear motifs in proteins of interest. Information about protein domains, protein structure and native disorder, cellular and taxonomic contexts is used to reduce or deprecate false positive matches. Results are graphically displayed in a ‘Bar Code’ format, which also displays known instances from homologous proteins through a novel ‘Instance Mapper’ protocol based on PHI-BLAST. ELM server output provides links to the ELM annotation as well as to a number of remote resources. Using the links, researchers can explore the motifs, proteins, complex structures and associated literature to evaluate whether candidate motifs might be worth experimental investigation

    SIRW: a web server for the Simple Indexing and Retrieval System that combines sequence motif searches with keyword searches

    No full text
    SIRW (http://sirw.embl.de/) is a World Wide Web interface to the Simple Indexing and Retrieval System (SIR) that is capable of parsing and indexing various flat file databases. In addition it provides a framework for doing sequence analysis (e.g. motif pattern searches) for selected biological sequences through keyword search. SIRW is an ideal tool for the bioinformatics community for searching as well as analyzing biological sequences of interest

    Multiple sequence alignment with the Clustal series of programs

    Get PDF
    The Clustal series of programs are widely used in molecular biology for the multiple alignment of both nucleic acid and protein sequences and for preparing phylogenetic trees. The popularity of the programs depends on a number of factors, including not only the accuracy of the results, but also the robustness, portability and user-friendliness of the programs. New features include NEXUS and FASTA format output, printing range numbers and faster tree calculation. Although, Clustal was originally developed to run on a local computer, numerous Web servers have been set up, notably at the EBI (European Bioinformatics Institute) (http://www.ebi.ac.uk/clustalw/)

    Finding biomedical categories in Medline<sup>®</sup>

    No full text
    <p>Abstract</p> <p>Background</p> <p>There are several humanly defined ontologies relevant to Medline. However, Medline is a fast growing collection of biomedical documents which creates difficulties in updating and expanding these humanly defined ontologies. Automatically identifying meaningful categories of entities in a large text corpus is useful for information extraction, construction of machine learning features, and development of semantic representations. In this paper we describe and compare two methods for automatically learning meaningful biomedical categories in Medline. The first approach is a simple statistical method that uses part-of-speech and frequency information to extract a list of frequent nouns from Medline. The second method implements an alignment-based technique to learn frequent generic patterns that indicate a hyponymy/hypernymy relationship between a pair of noun phrases. We then apply these patterns to Medline to collect frequent hypernyms as potential biomedical categories.</p> <p>Results</p> <p>We study and compare these two alternative sets of terms to identify semantic categories in Medline. We find that both approaches produce reasonable terms as potential categories. We also find that there is a significant agreement between the two sets of terms. The overlap between the two methods improves our confidence regarding categories predicted by these independent methods.</p> <p>Conclusions</p> <p>This study is an initial attempt to extract categories that are discussed in Medline. Rather than imposing external ontologies on Medline, our methods allow categories to emerge from the text.</p
    corecore